-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIP-0027: Use a Union type for the Label field #318
Conversation
@ribasushi I cant add you as a reviewer for some reason but would be good if you 👁️ |
For golang this will work, but I am not sure what the effect of a double-stringlike-union is in rust-land ( I'd defer to @vmx and @dignifiedquire ) |
To be even more specific: I feel that this part of the proposal is a step in the wrong direction in terms of maintainability:
|
FIPS/fip-0027.md
Outdated
@@ -1,7 +1,7 @@ | |||
--- | |||
fip: "0027" | |||
title: Change type of DealProposal Label field from String to byte array | |||
author: Laudiacay (@laudiacay) | |||
title: Change type of DealProposal Label field from String to a Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe first clarify that the original definition is not actually a String, but rather a golang String, which in turn means a fixed size byte array, with no actual validation for being characters asfaik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, thanks
This was motivated by a desire to make everything sane in rust. Rust's big "problem" is that it really doesn't want to make invalid CBOR Major Type 3 fields containing non UTF-8 bytes. Implementing a field in DealProposal struct that is either CBOR major type 2 or 3 is not a problem in rust. Rust will force a failure if someone tries to pack non-UTF8 bytes into major type 3 and that will be that.
True it's more complicated than just allowing a random byte string. What it gets us is no collateral damage of deals made across the upgrade boundary. I don't think living with this is significant maintenance burden. As the FIP mentions the HAMT has a field that is a union of cbor types and this has not caused us any maintenance problem over the past year. But if we decide we don't like this we can encourage people to move to major type 2 label fields, by changing go-fil-markets and messaging other deal makers more broadly, and then phase out major type 3 label fields in a later upgrade once the risk to deal making interruption is reduced. |
This makes sense to me as a workable direction, but that's not a strong endorsement. I'm still keen to hear other feedback. It could help a lot to see code snippets of how this would manifest in Go and Rust. |
Disclaimer: I know very little about blockchains or this part of Filecoin. So please read this as a comment from someone who is new to this and kind of an outsider perspective. The current problem is, that currently invalid non-spec compliant CBOR is produced. This means that third party CBOR tooling might not work with that data. We all agree that this should be fixed. The question is how. If I'd create something for Filecoin, being it an implementation, some analysis tool or whatever, I'd be more than happy, if I could just keep things simple and have as little moving parts as possible. Ideally I wouldn't need to put extra thought into how to process some label field. So if this field would just have a single CBOR type, that would be great. Now we come to the lack of my Filecoin knowledge. I don't know how much work/time/processing a migration of label fields take. If I understand correctly, with this proposal, we would need to go over every label field and check whether it is valid utf-8 or not. Then decide whether to keep the CBOR text string type, or change to CBOR byte string type. Would it make much difference to switch everything to the CBOR byte string type without any checking? Again, I don't know what is involved in this and it might just not be possible/too much effort: what if new deals would only use the CBOR byte string type from a certain point on? We then wait until there are no deals with the CBOR byte string type left and only then do the migration above. This surely is operational wise more effort. But it's a one time thing. Once done we have nice valid CBOR on-chain and never have to bother about it again. No additional complexity is added to neither protocol implementations, nor third party tools. |
@ribasushi Is the Design Rationale section not clear, or are you not convinced that a real issue exists? I'm happy to elaborate (as I indicated here, I may be hallucinating problems that don't actually exist).
@vmx Thanks for the feedback!
I will add a note about this to the FIP itself, thank you! |
@anorth A prototype in go-actors can be found here, as well as its integration in the go-fil-markets module and Lotus client. |
@arajasek no, I am not convinced you are actually solving the issue. Refer to the sentence from the original roadblock
At present, as @vmx very correctly noted, the on-chain protocol encodes an expectation that the client signature within PSD is done over invalid CBOR ( non-utf8 bytes within a type
What I am getting at is that what is on chain is not relevant, rather what is a signature verification made against is. Any change there will be equally painful. Moreover: a PSD message is very long lived ( potentially months ), and the CID of the serialization is also very tightly coupled with the FSM ( e.g. I again would like to reiterate my original proposal: fix the chain-storage asap to unblock lowest-level rust-tools, but keep everything Sig/CID-affecting using the status quo invalid CBOR representation, until we figure out a much more comprehensive way forward. |
@ribasushi Thanks for the reply! So, I agree -- we still have the problem for non-UTF-8 bytes. The chief difference between the old and new proposal is that in the old proposal all deals would fail (even those with empty labels!), whereas in the new proposal only non-UTF-8 labels will cause deals to fail. We can (and should) broadcast to users that this will be the case. To my knowledge, most tooling does not try to set non-UTF-8 strings. Does this make more sense now? Please push back if you're unconvinced. |
Thank you for clarifying, I missed that in my mental model / the FIP does not make this super clear. Let me rephrase this in riba-speak:
I no longer have strong objections to the proposal, with minor unease being:
If neither of these raise alarm: |
@ribasushi Yup, Riba-speak aligns well with my understanding here. I will add more detail about the custom CBOR-encoding, and the purpose it serves here (to mimic the existing string labels as much as possible).
I'm not sure I share your concern here. The union is really a temporary migration-type in my head, that will exist for the span of one network upgrade. In the network upgrade after the one that introduces this, we can migrate everything to bytes (see the Future Considerations section). |
FIPS/fip-0027.md
Outdated
@@ -16,38 +16,78 @@ spec-sections: | |||
|
|||
## Simple Summary | |||
<!--"If you can't explain it simply, you don't understand it well enough." Provide a simplified and layman-accessible explanation of the FIP.--> | |||
DealProposal's Label field is currently a String, but is not enforced to be UTF-8. This is not only outside the CBOR string specification, but can also be a source of bugs and difficulties in client implementations due to variations in String libraries. Rather than enforce it as UTF-8 everywhere, this FIP changes the type to be raw bytes for ease of implementation, bug avoidance, and CBOR compliance. | |||
Makers of deals on Filecoin can specify a "label" for their deals. This label is stored on the Filecoin blockchain, so it is important that there be no risk that a "bad" label can cause issues in nodes on the Filecoin network. Today, Filecoin does not enforce that this label meet UTF-8 encoding, even though it is increasingly the case that systems assume all strings are UTF-8 (see [here](https://en.wikipedia.org/wiki/Popularity_of_text_encodings#Popularity_internally_in_software) for more). This FIP proposes making a change to remove this abnormality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to keep the note about non-utf8 strings being invalid CBOR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien It's in the abstract below, but I don't think it belongs in the "Simple Summary" ("a simplified and layman-accessible explanation of the FIP")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @Stebalien here. I don't think the point is
even though it is increasingly the case that systems assume all strings are UTF-8
Even if it wouldn't be the case this should be fixed, because it is not a good idea to create invalid CBOR. It's independent of whether systems tend to use UTF-8 for strings or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vmx Yeah, that's a fair point. I'll add the detail in simple language.
Instead of going
string -> bytes
, gostring -> Union(byte,string)